Reduce memory allocations by writing directly into the output stream (#461) #472

brenuart · 2021-01-05T15:04:32Z

Reduce GC pressure by writing directly into the output stream instead of returning a byte array.

Introduce a new (internal) StreamingEncoder interface to be implemented by Encoders that supports writing directly into the output stream instead of returning their results as a byte array.

Update both the AbstractLogstashTcpSocketAppender and the CompositeJsonEncoder to support this new interface. This should hoppefully reduce the amount of short-lived byte arrays created for each log event.

Review test case and make use of actual instances instead of mocks (required to preserve state during test otherwise conditional operations like start/isStarted do not work as expected).

See #461 for more information.

brenuart · 2021-01-05T18:00:22Z

Regarding commit 3d68f91, I'm not sure it is actually a good idea to make use of Jackson internal utilities like BufferRecycler, ByteArrayBuilder and SegmentedStringBuilder...

Note also that minBufferSize is useless now :-(
Although ByteBufferBuilder has a constructor that accepts both an initialSize and a BufferRecycler, it ignores the initial size when creating the initial buffer and instead gets one from the BufferRecycler whose size is hardcoded to 2000 bytes by default. With this in mind, the minimal size of a ByteBufferBuilder is always 2000 bytes, not less - but it may be larger.

New buffers are allocated dynamically as data is written into the ByteBufferBuilder. They are allocated on the heap directly and don't come from the BufferRecycler (see ByteArrayBuilder.java#L294). As the comments says in the code, the size of the newly allocated buffer is half the total size of what is allocated so far (with a max value).

Let's take an example and create a new ByteBufferBuilder and give it a BufferRecycler. A first byte array of 2000 bytes is allocated (cold start). Then we write 3001 bytes - here is what happens:

the first 2000 bytes are stored in the initial buffer;
a new buffer needs to be allocated for the remaining 1001 bytes. As per the logic described above, the builder allocates a new buffer of half the size of what is currently allocated, i.e. 1000 bytes. The allocated memory is now 2000 + 1000 = 3000 bytes;
the next 1000 bytes are written in the tail buffer - 1 byte remains. A new buffer needs to be allocated whose size will be 3000 / 2 = 1500 bytes.
the last byte is written in the tail buffer. Total allocated memory is 2000 + 1000 + 1500 = 4500 bytes.

When done, you need to call ByteBufferBuilder.release() to return memory to the BufferRecycler. The release process returns the tail buffer to the recycler (see ByteArrayBuilder.java#L96) - meaning that in our example, the array of 1500 bytes is returned to the recycler.

Let's run the example a second time with the same recycler (purpose is to somehow reuse previous byte arrays to reduce memory allocation). We create a new ByteArrayBuilder that immediately asks the recycler for a first initial buffer. Unfortunately, the recycler now holds an array of size 1500 which is LESS than the hardcoded minimum set to 2000... So instead of returning the cached array, it discards it and creates a new one of 2000 bytes. The rest happens exactly as during the first - cold - run.

Conclusion: we have not saved/reused anything at all!

I'm puzzled to be honest. I don't see how this mechanism can provide an efficient caching... except when what you write never (or rarely) exceeds the hardcoded minimum (2000 bytes).

My feeling is that we'd better write our own mechanism instead of relying of something that is especially designed for use by Jackson internals only... May be I should create an issue on Jackson-Core and see what Tatu has to say about it... (no doubt it will first tell me I should not use it).

brenuart · 2021-01-09T22:20:49Z

Additional notes:

(1)
I made another simpler version of this PR (not committed) with a custom ByteArrayOutputStream that grows by allocating additional chunks (to avoid the memory copy that occurs in the JDK version when it grows). Like Jackson's approach, the buffer recycles only the first chunk (whose size is determined by the minBufferSize configuration property) and discards the others. This works like a charm provided minBufferSize is set to an optimal value. The buffer is stored in a ThreadLocal with a SoftReference so it can be garbage collected should the vm decides to do so...

For information, in production we ingest approximatively 15K events/s - for one hour during a working day, average JSON size is around 540 bytes, p95 is at 602 and the largest is several kilobytes (holds a stack trace) . An initial buffer size of 1k or 2k seems to be a good default value.

I quite like this solution. I wonder if we need to go any further... After all, the rest of the chain is not optimised... look at the Layout for instance... it returns a String... and most Logback components are not designed to Stream their output...

(2)
Asking the encoder to write directly in the output stream also means that the stream may be "corrupted" if the encoder fails in the middle of processing a log event (it may have written some bytes already at the time it throws an exception). This does not happen when the Encoder returns its result as a byte array.
I would say that's okay because it only happens in exceptional situations where the encoder has a problem and can't do its job anyway. It can also happen when the JVM is stopped abruptly. With TCP, it can also happen when a connection is lost - the TCP buffers/window may hold only the first bytes of the last JSON even... This means that the receiving side (the collector) must be ready to receive "incorrect" data anyway...

…f returning a byte array Introduce a new (internal) StreamingEncoder interface to be implemented by Encoders that supports writing directly into the output stream instead of returning their results as a byte array. Update both the AbstractLogstashTcpSocketAppender and the CompositeJsonEncoder to support this new interface. This should hoppefully reduce the amount of short-lived byte arrays created for each log event. See logfellow#461 for more information.

… ByteArrayOutputStream for every log event

philsttr · 2021-01-09T23:01:01Z

In general, the StreamingEncoder implementation looks great! I'm happy to incorporate it into logstash-logback-encoder.

The release process returns the tail buffer to the recycler

The fact that it only returned the tail buffer was surprising to me. Thanks for the detailed analysis! Good stuff.

A custom buffer recycler sounds perfectly reasonable, unless it is crazy complex. Want to submit it as an alternate PR for comparison?

brenuart · 2021-07-05T17:23:41Z

Hi Phil... I'm back with this optimisation. A few words about what I did so far...

The ReusableByteBuffer is some kind of fast alternative to JDK ByteArrayOutputStream with two notable differences. First off all it maintains multiple byte[] arrays when it needs to grow and avoids copying data as does the JDK implementation. Secondly, instead of discarding all the internal byte arrays when reset, the buffer always keeps the first. The idea is to set its size large enough to hold at least the "average" json event and reduce the number of required allocations.

ReusableByteBuffers acts as a pool of ReusableByteBuffer instances.
The first implementation used a ThreadLocal to keep one instance per thread. After some thought, it appears this strategy may lead to a higher-than-required memory consumption with as many buffers as we have threads accessing the Encoder. This may not be a problem with Async appenders (only one thread) but may become serious with other appenders like FileAppender or ConsoleAppender for instance. If you look carefully at their implementation you will notice that multiple threads can access the Encoder - only the writing into the underlying output stream is guarded.
The new implementation maintains a ConcurrentLinkedList with SoftReference to allocated buffers. A new buffer is created when the list is empty. They are discarded when the JVM is under high memory pressure and decides to reclaim them.

The CompositeJsonEncoder can be used as a "standard" Encoder returning a byte[] array or as a StreamingEncoder to render directly inside an outputStream passed as argument together with the event to render. As for now, only the LogstashTcpSocketAppender supports this method.
The encoder maintains its own pool of buffers. Their initial size is controlled by the existing minBufferSize parameter. These buffers are used by the legacy "encode" methods returning a byte array. When it needs to encode an event, it gets a buffer from the pool and asks the formatter to render it directly into the buffer. When done, the buffer is returned to the pool and reused by subsequent events.

The CompositeJsonLayout follows the same logic. As you noticed, it also uses the same reusable buffers as the CompositeJsonEncoder. Because it needs to return a String instead of bytes, it uses a Writer around the buffer and asks the json formatter to render the event using it. This strategy is similar to what was done before, except it is now moved inside the CompositeJsonLayout instead of in the CompositeJsonFormatter. This implementation is not as "highly" optimised as for the encoder... but AFAIK Layouts are somehow deprecated (correct?).

I made a small benchmark using JMH to compare this implementation with version 6.6. Here are the results:

speed: +23%
GC count: -68%
GC Time: -64%
Allocations: -76%

Not bad... especially with regards to the positive impacts on memory and GC.
But we can do even better! The bottleneck now is in the CompositeJsonFormatter that creates a new Jackson JsonGenerator for every event. I made some additional experiments by reusing the same JsonEncoder as long as the output stream does not change. This gives the following improvements compared to 6.6:

speed: +46%
GC count: -87%
GC Time: -86%
Allocations: -87%

This of course requires additional changes. I propose to cover it in a separate issue. The improvements provided by this PR already seem to be good enough ;-) Tell me what you think...

philsttr

This looks really good!

The performance numbers you posted are impressive.

My comments are mostly minor. Overall it's a really great implementation.

src/main/java/net/logstash/logback/util/ThreadLocalBuffers.java

src/main/java/net/logstash/logback/composite/CompositeJsonFormatter.java

src/main/java/net/logstash/logback/encoder/CompositeJsonEncoder.java

src/main/java/net/logstash/logback/util/ReusableByteBuffer.java

src/main/java/net/logstash/logback/util/ReusableByteBuffers.java

src/main/java/net/logstash/logback/util/ReusableByteBuffer.java

src/main/java/net/logstash/logback/util/ReusableByteBuffers.java

philsttr · 2021-07-05T22:52:09Z

Thanks again for your work on this! Very cool improvement.

(And thanks for addressing all my code review comments. I know I can be picky at times. ;) Cheers!

brenuart · 2021-07-05T23:05:09Z

👍
I noticed however that my PR changed the order of import statements :-( Should it be part of the checkstyle rules as well?

philsttr · 2021-07-05T23:32:28Z

I think the import order was all over the place when I originally enabled the checkstyle rules. I tried to create rules that matched the existing codebase as much as possible to minimize reformatting, and to "stop the bleeding".

Having said that, I'm not against defining import order rules and reorganizing the existing imports. If there is a clear majority practice, I'd like to stick with it. But I'm less concerned about changing imports than reformatting code in general because I rarely use git blame to look at imports. I'm usually looking at actual code when using git blame, and therefore am much more hesitant to reformat the actual code.

brenuart mentioned this pull request Jan 8, 2021

CompositeJsonEncoder throws NullPointerException when not started #474

Closed

philsttr mentioned this pull request Jan 9, 2021

Reduce GC pressure in CompositeJsonEncoder and *LogstashTcpSocketAppender #461

Closed

brenuart added 3 commits January 9, 2021 23:43

Make use of reusable thread-bound buffers instead of allocating a new…

abadd8b

… ByteArrayOutputStream for every log event

Assert encoder reports it is not started when encode() is called

774d0d6

brenuart force-pushed the gh461 branch from 3d68f91 to 774d0d6 Compare January 9, 2021 22:51

Alternate strategy to reuse buffers

7b12814

brenuart force-pushed the gh461 branch from f2a6dac to b52f0c8 Compare July 5, 2021 15:40

Merge branch 'main' into gh461

d0ca2ff

brenuart force-pushed the gh461 branch from b52f0c8 to d0ca2ff Compare July 5, 2021 16:23

philsttr requested changes Jul 5, 2021

View reviewed changes

brenuart added 2 commits July 5, 2021 23:24

Misc changes to take review comments into account

0d8ad26

Rename ReusableByteBuffers into ReusableByteBufferPool

513da38

brenuart force-pushed the gh461 branch from d46e411 to 513da38 Compare July 5, 2021 22:17

philsttr approved these changes Jul 5, 2021

View reviewed changes

philsttr added this to the 7.0 milestone Jul 5, 2021

philsttr merged commit fa81678 into logfellow:main Jul 5, 2021

brenuart deleted the gh461 branch July 5, 2021 23:05

philsttr linked an issue Jul 26, 2021 that may be closed by this pull request

Reduce GC pressure in CompositeJsonEncoder and *LogstashTcpSocketAppender #461

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory allocations by writing directly into the output stream (#461) #472

Reduce memory allocations by writing directly into the output stream (#461) #472

brenuart commented Jan 5, 2021

brenuart commented Jan 5, 2021 •

edited

Loading

brenuart commented Jan 9, 2021 •

edited

Loading

philsttr commented Jan 9, 2021 •

edited

Loading

brenuart commented Jul 5, 2021 •

edited

Loading

philsttr left a comment

philsttr commented Jul 5, 2021

brenuart commented Jul 5, 2021

philsttr commented Jul 5, 2021

Reduce memory allocations by writing directly into the output stream (#461) #472

Reduce memory allocations by writing directly into the output stream (#461) #472

Conversation

brenuart commented Jan 5, 2021

brenuart commented Jan 5, 2021 • edited Loading

brenuart commented Jan 9, 2021 • edited Loading

philsttr commented Jan 9, 2021 • edited Loading

brenuart commented Jul 5, 2021 • edited Loading

philsttr left a comment

Choose a reason for hiding this comment

philsttr commented Jul 5, 2021

brenuart commented Jul 5, 2021

philsttr commented Jul 5, 2021

brenuart commented Jan 5, 2021 •

edited

Loading

brenuart commented Jan 9, 2021 •

edited

Loading

philsttr commented Jan 9, 2021 •

edited

Loading

brenuart commented Jul 5, 2021 •

edited

Loading